99 research outputs found

    Distribution of graph-distances in Boltzmann ensembles of RNA secondary structures

    Full text link
    Large RNA molecules often carry multiple functional domains whose spatial arrangement is an important determinant of their function. Pre-mRNA splicing, furthermore, relies on the spatial proximity of the splice junctions that can be separated by very long introns. Similar effects appear in the processing of RNA virus genomes. Albeit a crude measure, the distribution of spatial distances in thermodynamic equilibrium therefore provides useful information on the overall shape of the molecule can provide insights into the interplay of its functional domains. Spatial distance can be approximated by the graph-distance in RNA secondary structure. We show here that the equilibrium distribution of graph-distances between arbitrary nucleotides can be computed in polynomial time by means of dynamic programming. A naive implementation would yield recursions with a very high time complexity of O(n^11). Although we were able to reduce this to O(n^6) for many practical applications a further reduction seems difficult. We conclude, therefore, that sampling approaches, which are much easier to implement, are also theoretically favorable for most real-life applications, in particular since these primarily concern long-range interactions in very large RNA molecules.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    Understanding the errors of SHAPE-directed RNA structure modeling

    Full text link
    Single-nucleotide-resolution chemical mapping for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2'-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0-2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method using six molecules for which crystallographic data are available: tRNA(phe) and 5S rRNA from Escherichia coli, the P4-P6 domain of the Tetrahymena group I ribozyme, and ligand-bound domains from riboswitches for adenine, cyclic di-GMP, and glycine. SHAPE-directed modeling of these highly structured RNAs gave an overall false negative rate (FNR) of 17% and a false discovery rate (FDR) of 21%, with at least one helix prediction error in five of the six cases. Extensive variations of data processing, normalization, and modeling parameters did not significantly mitigate modeling errors. Only one varation, filtering out data collected with deoxyinosine triphosphate during primer extension, gave a modest improvement (FNR = 12%, and FDR = 14%). The residual structure modeling errors are explained by the insufficient information content of these RNAs' SHAPE data, as evaluated by a nonparametric bootstrapping analysis. Beyond these benchmark cases, bootstrapping suggests a low level of confidence (<50%) in the majority of helices in a previously proposed SHAPE-directed model for the HIV-1 RNA genome. Thus, SHAPE-directed RNA modeling is not always unambiguous, and helix-by-helix confidence estimates, as described herein, may be critical for interpreting results from this powerful methodology.Comment: Biochemistry, Article ASAP (Aug. 15, 2011

    R2R - software to speed the depiction of aesthetic consensus RNA secondary structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With continuing identification of novel structured noncoding RNAs, there is an increasing need to create schematic diagrams showing the consensus features of these molecules. RNA structural diagrams are typically made either with general-purpose drawing programs like Adobe Illustrator, or with automated or interactive programs specific to RNA. Unfortunately, the use of applications like Illustrator is extremely time consuming, while existing RNA-specific programs produce figures that are useful, but usually not of the same aesthetic quality as those produced at great cost in Illustrator. Additionally, most existing RNA-specific applications are designed for drawing single RNA molecules, not consensus diagrams.</p> <p>Results</p> <p>We created R2R, a computer program that facilitates the generation of aesthetic and readable drawings of RNA consensus diagrams in a fraction of the time required with general-purpose drawing programs. Since the inference of a consensus RNA structure typically requires a multiple-sequence alignment, the R2R user annotates the alignment with commands directing the layout and annotation of the RNA. R2R creates SVG or PDF output that can be imported into Adobe Illustrator, Inkscape or CorelDRAW. R2R can be used to create consensus sequence and secondary structure models for novel RNA structures or to revise models when new representatives for known RNA classes become available. Although R2R does not currently have a graphical user interface, it has proven useful in our efforts to create 100 schematic models of distinct noncoding RNA classes.</p> <p>Conclusions</p> <p>R2R makes it possible to obtain high-quality drawings of the consensus sequence and structural models of many diverse RNA structures with a more practical amount of effort. R2R software is available at <url>http://breaker.research.yale.edu/R2R</url> and as an Additional file.</p

    Characterization of RNase MRP RNA and novel snoRNAs from Giardia intestinalis and Trichomonas vaginalis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Eukaryotic cells possess a complex network of RNA machineries which function in RNA-processing and cellular regulation which includes transcription, translation, silencing, editing and epigenetic control. Studies of model organisms have shown that many ncRNAs of the RNA-infrastructure are highly conserved, but little is known from non-model protists. In this study we have conducted a genome-scale survey of medium-length ncRNAs from the protozoan parasites <it>Giardia intestinalis </it>and <it>Trichomonas vaginalis</it>.</p> <p>Results</p> <p>We have identified the previously 'missing' <it>Giardia </it>RNase MRP RNA, which is a key ribozyme involved in pre-rRNA processing. We have also uncovered 18 new H/ACA box snoRNAs, expanding our knowledge of the H/ACA family of snoRNAs.</p> <p>Conclusions</p> <p>Results indicate that <it>Giardia intestinalis </it>and <it>Trichomonas vaginalis</it>, like their distant multicellular relatives, contain a rich infrastructure of RNA-based processing. From here we can investigate the evolution of RNA processing networks in eukaryotes.</p

    Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells

    Get PDF
    Post-transcriptional modification of RNA nucleosides occurs in all living organisms. Pseudouridine, the most abundant modified nucleoside in non-coding RNAs, enhances the function of transfer RNA and ribosomal RNA by stabilizing the RNA structure. Messenger RNAs were not known to contain pseudouridine, but artificial pseudouridylation dramatically affects mRNA functionβ€”it changes the genetic code by facilitating non-canonical base pairing in the ribosome decoding centre. However, without evidence of naturally occurring mRNA pseudouridylation, its physiological relevance was unclear. Here we present a comprehensive analysis of pseudouridylation in Saccharomyces cerevisiae and human RNAs using Pseudo-seq, a genome-wide, single-nucleotide-resolution method for pseudouridine identification. Pseudo-seq accurately identifies known modification sites as well as many novel sites in non-coding RNAs, and reveals hundreds of pseudouridylated sites in mRNAs. Genetic analysis allowed us to assign most of the new modification sites to one of seven conserved pseudouridine synthases, Pus1–4, 6, 7 and 9. Notably, the majority of pseudouridines in mRNA are regulated in response to environmental signals, such as nutrient deprivation in yeast and serum starvation in human cells. These results suggest a mechanism for the rapid and regulated rewiring of the genetic code through inducible mRNA modifications. Our findings reveal unanticipated roles for pseudouridylation and provide a resource for identifying the targets of pseudouridine synthases implicated in human disease.American Cancer Society (Robbie Sue Mudd Kidney Cancer Research Scholar Grant RSG-13-396-01-RMC)National Institutes of Health (U.S.) (GM094303)National Institutes of Health (U.S.) (GM081399)American Cancer Society. New England Division (Ellison Foundation Postdoctoral Fellowship)American Cancer Society (Postdoctoral Fellowship PF-13-319-01-RMC)National Institutes of Health (U.S.) (Pre-doctoral Training Grant T32GM007287

    COMRADES determines in vivo RNA structures and interactions.

    Get PDF
    The structural flexibility of RNA underlies fundamental biological processes, but there are no methods for exploring the multiple conformations adopted by RNAs in vivo. We developed cross-linking of matched RNAs and deep sequencing (COMRADES) for in-depth RNA conformation capture, and a pipeline for the retrieval of RNA structural ensembles. Using COMRADES, we determined the architecture of the Zika virus RNA genome inside cells, and identified multiple site-specific interactions with human noncoding RNAs.This work was supported by Cancer Research UK (C13474/A18583, C6946/A14492) and the Wellcome Trust (104640/Z/14/Z, 092096/Z/10/Z) to E.A.M. O.Z. was supported by the Human Frontier Science Program (HFSP, LT000558/2015), the European Molecular Biology Organization (EMBO, ALTF1622-2014), and the Blavatnik Family Foundation postdoctoral fellowship. G.K. and M.G. were supported by Wellcome Trust grant 207507 and UK Medical Research Council. A.T.L.L. and J.C.M. were supported by core funding from Cancer Research UK (award no. 17197 to JCM). J.C.M was also supported by core funding from EMBL. I.G. and L.W.M. were supported by the Wellcome Trust Senior Fellowship in Basic Biomedical Science to I.G. (207498/Z/17/Z). I.J.M., L.F.G. and J.S.-G. were supported by grants R01GM104475 and R01GM115649 from NIGMS. C.K.K was supported by City University of Hong Kong Projects 9610363 and 7200520, Croucher Foundation Project 9500030 and Hong Kong RGC Projects 9048103 and 9054020. C.-F.Q. was supported by the NSFC Excellent Young Scientist Fund 81522025 and the Newton Advanced Fellowship from the Academy of Medical Sciences, UK

    The Short Non-Coding Transcriptome of the Protozoan Parasite Trypanosoma cruzi

    Get PDF
    The pathway for RNA interference is widespread in metazoans and participates in numerous cellular tasks, from gene silencing to chromatin remodeling and protection against retrotransposition. The unicellular eukaryote Trypanosoma cruzi is missing the canonical RNAi pathway and is unable to induce RNAi-related processes. To further understand alternative RNA pathways operating in this organism, we have performed deep sequencing and genome-wide analyses of a size-fractioned cDNA library (16–61 nt) from the epimastigote life stage. Deep sequencing generated 582,243 short sequences of which 91% could be aligned with the genome sequence. About 95–98% of the aligned data (depending on the haplotype) corresponded to small RNAs derived from tRNAs, rRNAs, snRNAs and snoRNAs. The largest class consisted of tRNA-derived small RNAs which primarily originated from the 3β€² end of tRNAs, followed by small RNAs derived from rRNA. The remaining sequences revealed the presence of 92 novel transcribed loci, of which 79 did not show homology to known RNA classes

    Disease-Associated Mutations That Alter the RNA Structural Ensemble

    Get PDF
    Genome-wide association studies (GWAS) often identify disease-associated mutations in intergenic and non-coding regions of the genome. Given the high percentage of the human genome that is transcribed, we postulate that for some observed associations the disease phenotype is caused by a structural rearrangement in a regulatory region of the RNA transcript. To identify such mutations, we have performed a genome-wide analysis of all known disease-associated Single Nucleotide Polymorphisms (SNPs) from the Human Gene Mutation Database (HGMD) that map to the untranslated regions (UTRs) of a gene. Rather than using minimum free energy approaches (e.g. mFold), we use a partition function calculation that takes into consideration the ensemble of possible RNA conformations for a given sequence. We identified in the human genome disease-associated SNPs that significantly alter the global conformation of the UTR to which they map. For six disease-states (Hyperferritinemia Cataract Syndrome, Ξ²-Thalassemia, Cartilage-Hair Hypoplasia, Retinoblastoma, Chronic Obstructive Pulmonary Disease (COPD), and Hypertension), we identified multiple SNPs in UTRs that alter the mRNA structural ensemble of the associated genes. Using a Boltzmann sampling procedure for sub-optimal RNA structures, we are able to characterize and visualize the nature of the conformational changes induced by the disease-associated mutations in the structural ensemble. We observe in several cases (specifically the 5β€² UTRs of FTL and RB1) SNP–induced conformational changes analogous to those observed in bacterial regulatory Riboswitches when specific ligands bind. We propose that the UTR and SNP combinations we identify constitute a β€œRiboSNitch,” that is a regulatory RNA in which a specific SNP has a structural consequence that results in a disease phenotype. Our SNPfold algorithm can help identify RiboSNitches by leveraging GWAS data and an analysis of the mRNA structural ensemble

    Skipping of Exons by Premature Termination of Transcription and Alternative Splicing within Intron-5 of the Sheep SCF Gene: A Novel Splice Variant

    Get PDF
    Stem cell factor (SCF) is a growth factor, essential for haemopoiesis, mast cell development and melanogenesis. In the hematopoietic microenvironment (HM), SCF is produced either as a membrane-bound (βˆ’) or soluble (+) forms. Skin expression of SCF stimulates melanocyte migration, proliferation, differentiation, and survival. We report for the first time, a novel mRNA splice variant of SCF from the skin of white merino sheep via cloning and sequencing. Reverse transcriptase (RT)-PCR and molecular prediction revealed two different cDNA products of SCF. Full-length cDNA libraries were enriched by the method of rapid amplification of cDNA ends (RACE-PCR). Nucleotide sequencing and molecular prediction revealed that the primary 1519 base pair (bp) cDNA encodes a precursor protein of 274 amino acids (aa), commonly known as β€˜soluble’ isoform. In contrast, the shorter (835 and/or 725 bp) cDNA was found to be a β€˜novel’ mRNA splice variant. It contains an open reading frame (ORF) corresponding to a truncated protein of 181 aa (vs 245 aa) with an unique C-terminus lacking the primary proteolytic segment (28 aa) right after the D175G site which is necessary to produce β€˜soluble’ form of SCF. This alternative splice (AS) variant was explained by the complete nucleotide sequencing of splice junction covering exon 5-intron (5)-exon 6 (948 bp) with a premature termination codon (PTC) whereby exons 6 to 9/10 are skipped (Cassette Exon, CE 6–9/10). We also demonstrated that the Northern blot analysis at transcript level is mediated via an intron-5 splicing event. Our data refine the structure of SCF gene; clarify the presence (+) and/or absence (βˆ’) of primary proteolytic-cleavage site specific SCF splice variants. This work provides a basis for understanding the functional role and regulation of SCF in hair follicle melanogenesis in sheep beyond what was known in mice, humans and other mammals
    • …
    corecore